-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arm64] ASIMD By Element Intrinsics #36916
[Arm64] ASIMD By Element Intrinsics #36916
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @tannergooding |
522dbaa
to
f745a7e
Compare
The API part of this PR is ready for review. This PR includes most of the "by element" intrinsics (excluding the ones that have not been approved yet) and some load/store intrinsic that operate on a single SIMD element. The reason for combining these into one PR is that I found that the JIT changes that I need to make to support all of these are tightly coupled to each other and I was trying to not spread them across many PRs. @CarolEidt @tannergooding @TamarChristinaArm Can you please take a look while I am working on finishing the JIT part of the changes? |
f745a7e
to
ac6d6c0
Compare
… return value type in AdvSimd.cs
…latformNotSupported.cs
…AdvSimd.cs AdvSimd.PlatformNotSupported.cs
…n AdvSimd.cs AdvSimd.PlatformNotSupported.cs
…rmNotSupported.cs
…edScalar in AdvSimd.cs AdvSimd.PlatformNotSupported.cs
…md.PlatformNotSupported.cs
…AdvSimd.cs AdvSimd.PlatformNotSupported.cs
…dvSimd.cs AdvSimd.PlatformNotSupported.cs
…Arm32 in AdvSimd.cs AdvSimd.PlatformNotSupported.cs
…ormNotSupported.cs
….PlatformNotSupported.cs
a6cc4cd
to
283e686
Compare
@CarolEidt @tannergooding This is ready for review now. Please take a look when you have time. I rebased on top of the changes in #37859 |
I'll start giving it a look this afternoon. Thanks! |
@@ -817,7 +817,7 @@ void emitter::emitInsSanityCheck(instrDesc* id) | |||
assert(isVectorRegister(id->idReg2())); | |||
assert(isVectorRegister(id->idReg3())); | |||
elemsize = optGetElemsize(id->idInsOpt()); | |||
assert(isValidVectorIndex(id->idOpSize(), elemsize, emitGetInsSC(id))); | |||
assert(isValidVectorIndex(EA_16BYTE, elemsize, emitGetInsSC(id))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this changed to account for the simd8 result with simd16 selection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of the destination/source registers shouldn't matter here. For example,
FMLA Vd.T, Vn.T, Vm.Ts[index]
index is always encoded as H:L (2 bits) when T is either 2S or 4S (and Ts is S).
In other words, the range of valid values for index is computed based on the assumption that Vm is 16 bytes.
@@ -4728,8 +4728,8 @@ struct GenTreeJitIntrinsic : public GenTreeOp | |||
ClassLayout* m_layout; | |||
|
|||
union { | |||
var_types gtOtherBaseType; // For AVX2 Gather* intrinsics | |||
regNumberSmall gtOtherReg; // For intrinsics that return 2 registers | |||
var_types gtAuxiliaryType; // For intrinsics than need another type (e.g. Avx2.Gather* or SIMD (by element)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why rename to AuxiliaryType
but not AuxiliaryReg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I didn't touch gtOtherReg in other files.
The intent of the renaming was to get rid of BaseType since for SIMD By Element intrinsics I use this field to encode a SIMD type of an indexed element while in some other case we do use it to keep the "other" base type (e.g. in case of wide/long intrinsics). I could've renamed it to gtOtherType.
Do you want me to rename the gtOtherReg -> gtAuxiliaryReg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was mostly just interested. This makes sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the renaming. I think it's reasonable to keep the reg as gtOtherReg
, as that naming is used on other GenTree
nodes. In any case, if we didn't change it to gtAuxiliaryType
we might want to name it gtOtherType
(though I'm happy with this change as-is).
static bool SIMDScalar(NamedIntrinsic id) | ||
{ | ||
const HWIntrinsicFlag flags = lookupFlags(id); | ||
return (flags & HW_Flag_SIMDScalar) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a flag, rather than a category?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because, many intrinsics can have features of SIMDScalar and some other category:
- SIMD ShiftLeftByImmediate + SIMDScalar
- SIMD ShiftRightByImmediate + SIMDScalar
- SIMD ByIndexedElement + SIMDScalar
and I didn't want to have all possible combinations of those.
In addition, on Arm64 flag SIMDScalar is going to be used only in CodeGen to set INS_OPTS_NONE (i.e. switch between vector variant to scalar variant for a given instruction)
while a category value is used in multiple phases (e.g. ByIndexedElement is used in LSRA, Lower and Importer) for making various decisions (what registers can be used for an indexed element, how to perform containment analysis etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Makes sense to me.
HARDWARE_INTRINSIC(AdvSimd, Xor, -1, 2, {INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor}, HW_Category_SimpleSIMD, HW_Flag_Commutative) | ||
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningLower, 8, 1, {INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg) | ||
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningUpper, 16, 1, {INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg) | ||
HARDWARE_INTRINSIC(AdvSimd, Abs, -1, 1, {INS_abs, INS_invalid, INS_abs, INS_invalid, INS_abs, INS_invalid, INS_invalid, INS_invalid, INS_fabs, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git stops reporting changes for this file after this line. Is it largely just whitespace and flags/category changes to match the previous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate, so I can only suggest to review in commit-by-commit fashion.
Most of the changes come from this commit so to answer your question - yes, this is to account for flags/category change. I also sorted the values in flags column in alphabetic order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems that you added HW_Flag_NoFloatingPointUsed
in some places. I confess that I probably haven't carefully scrutinized the changes in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems that you added HW_Flag_NoFloatingPointUsed in some places. I confess that I probably haven't carefully scrutinized the changes in this file.
Yes, I added this flag to the intrinsics that operate on general-purpose registers (e.g. crc32h, rbit, clz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall - a couple of minor suggestions and request for comment.
@@ -3782,7 +3782,7 @@ class Compiler | |||
GenTree* getArgForHWIntrinsic(var_types argType, CORINFO_CLASS_HANDLE argClass, bool expectAddr = false); | |||
GenTree* impNonConstFallback(NamedIntrinsic intrinsic, var_types simdType, var_types baseType); | |||
GenTree* addRangeCheckIfNeeded( | |||
NamedIntrinsic intrinsic, GenTree* lastOp, bool mustExpand, int immLowerBound, int immUpperBound); | |||
NamedIntrinsic intrinsic, GenTree* immOp, bool mustExpand, int immLowerBound, int immUpperBound); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's minor, but thanks for renaming this - even if/when it was the last op, semantically it's more important that it's an immediate.
src/coreclr/src/jit/emitarm64.cpp
Outdated
@@ -5827,7 +5827,6 @@ void emitter::emitIns_R_R_R( | |||
elemsize = EA_1BYTE; | |||
opt = optMakeArrangement(size, elemsize); | |||
} | |||
assert(isValidArrangement(size, opt)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why these asserts were removed - are they invalid or somehow redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! For some reason I mistakenly though that they are redundant - I will revert this commit.
src/coreclr/src/jit/emitarm64.cpp
Outdated
@@ -6368,6 +6366,11 @@ void emitter::emitIns_R_R_R_I(instruction ins, | |||
assert((opt == INS_OPTS_4H) || (opt == INS_OPTS_2S)); | |||
elemsize = optGetElemsize(opt); | |||
assert(isValidVectorIndex(EA_16BYTE, elemsize, imm)); | |||
// Restricted to V0-V15 when element size is H | |||
if ((elemsize == EA_2BYTE) && (reg3 >= REG_V16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the conditions consistent, perhaps this could be
if ((elemsize == EA_2BYTE) && (reg3 >= REG_V16)) | |
if ((elemsize == EA_2BYTE) && ((genRegMask(reg3) & RBM_ASIMD_INDEXED_H_ELEMENT_ALLOWED_REGS) == 0)) |
@@ -4728,8 +4728,8 @@ struct GenTreeJitIntrinsic : public GenTreeOp | |||
ClassLayout* m_layout; | |||
|
|||
union { | |||
var_types gtOtherBaseType; // For AVX2 Gather* intrinsics | |||
regNumberSmall gtOtherReg; // For intrinsics that return 2 registers | |||
var_types gtAuxiliaryType; // For intrinsics than need another type (e.g. Avx2.Gather* or SIMD (by element)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the renaming. I think it's reasonable to keep the reg as gtOtherReg
, as that naming is used on other GenTree
nodes. In any case, if we didn't change it to gtAuxiliaryType
we might want to name it gtOtherType
(though I'm happy with this change as-is).
@@ -671,6 +680,50 @@ static bool isSupportedBaseType(NamedIntrinsic intrinsic, var_types baseType) | |||
return false; | |||
} | |||
|
|||
struct HWIntrinsicSignatureReader final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment.
{ | ||
assert(sig->numArgs == 3); | ||
immOp = impStackTop(1).val; | ||
assert(HWIntrinsicInfo::isImmOp(intrinsic, immOp)); | ||
} | ||
else if (intrinsic == NI_AdvSimd_Arm64_InsertSelectedScalar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this method is already rather large, you might consider extracting this code block, and the one for HW_Category_SIMDByIndexedElement
into separate methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was thinking about splitting the method into multiple methods - or even have a helper class that solely does hardware intrinsic importation - but this is more major refactoring than I want for this PR and likely affect x86/x64 side of the intrinsics, so I decided to defer such change to a later point when all the Arm64 intrinsics are implemented.
src/coreclr/src/jit/hwintrinsic.cpp
Outdated
#ifdef TARGET_XARCH | ||
if ((intrinsic == NI_SSE42_Crc32) || (intrinsic == NI_SSE42_X64_Crc32)) | ||
{ | ||
// TODO - currently we use the BaseType to bring the type of the second argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This should probably be TODO-ARM64-Cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this should be TODO-XArch-Cleanup
- I updated the comment
HARDWARE_INTRINSIC(AdvSimd, Xor, -1, 2, {INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor}, HW_Category_SimpleSIMD, HW_Flag_Commutative) | ||
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningLower, 8, 1, {INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg) | ||
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningUpper, 16, 1, {INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg) | ||
HARDWARE_INTRINSIC(AdvSimd, Abs, -1, 1, {INS_abs, INS_invalid, INS_abs, INS_invalid, INS_abs, INS_invalid, INS_invalid, INS_invalid, INS_fabs, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems that you added HW_Flag_NoFloatingPointUsed
in some places. I confess that I probably haven't carefully scrutinized the changes in this file.
… return value type in AdvSimd.PlatformNotSupported.cs
This reverts commit 2dc7cd8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
Fixes #33683
Fixes #24794 - (by element intrinsics is the remaining part)
Fixes #36300
Fixes #36298
Fixes #33683
Fixes #33490
Fixes #35037 - (InsertSelectedScalar is the remaining part)